Optimal Run-Time Tracing of Message-Passing Programs

نویسندگان

  • Anish Karmarkar
  • Nitin Vaidya
  • Robert H B Netzer
چکیده

The widespread adoption of distributed computing has accentuated the need for an e ective set of support tools to facilitate debugging and monitoring of distributed programs. Unfortunately for distributed programs, this is not a trivial task. Many distributed programs are inherently non-deterministic in nature. Two runs of the same programs with the same input data may not result in the same execution sequence. Cyclic debugging is one of the most common strategies used in debugging. To allow cyclic debugging, messages are traced for repeatable execution. In this paper, we de ne a race in the context of a message passing program and present a simple proof that it is impossible to have an algorithm, which will produce an optimal message trace (least number on messages traced), in general. We then present two tracing algorithms, Algorithm A and Algorithm B. Both the algorithms trace messages at run-time, i.e., when a message is received at a process. Algorithm A does optimal tracing of messages, given the fact that messages are traced at run-time, and no information about the future is available when these decisions are made. Algorithm B improves on the storage requirement and execution time of Algorithm A, and is based on the observation that only (n-1) bu ers are required per process for optimal run-time decision making, where n is the number of processes in the system. This algorithm is an improvement over the algorithm presented in [10], which does optimal tracing only when the races amongst messages are transitive. 2

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Implementation of Race Detection and Deterministic Replay with MPI

The Parallel Debugging Tool (PDT) of the Annai programming environment is developed within the Joint CSCS-ETH/NEC Collaboration in Parallel Processing. Similarly to the other components of the integrated environment, PDT aims to provide support for application developers to debug portable large-scale data-parallel programs based on HPF, and message-passing programs based on the MPI standard. Fo...

متن کامل

Trace-Based Run-Time Analysis of Message-Passing Go Programs

We consider the task of analyzing message-passing programs by observing their run-time behavior. We introduce a purely librarybased instrumentation method to trace communication events during execution. A model of the dependencies among events can be constructed to identify potential bugs. Compared to the vector clock method, our approach is much simpler and has in general a significant lower r...

متن کامل

The Performance of Two Tracing and Replay Algorithms for Message-Passing Parallel Programs

Debugging parallel message-passing programs is complicated by the non-determinism that is inherent in those programs. Cyclical debugging, which is a proven method for sequential programs, often fails when debugging parallel programs because different executions of the same program may exhibit different behaviors due to non-determinism. Some approaches have been studied to remedy this problem. W...

متن کامل

Design and Implementation of a Distributed Monitor for Semi-on-line Monitoring of Visualmp Applications1

A new application-level, software tracing monitor is designed and implemented for the VisualMP graphical parallel programming environment to support semi-on-line monitoring of message-passing programs in heterogeneous environments. We present the design aspects of the monitor and the main implementation issues.

متن کامل

The Distributed Application Debugger

Developing parallel programs which run on distributed computer clusters introduces additional challenges to those present in traditional sequential programs. Debugging parallel programs requires not only inspecting the sequential code executing on each node but also tracking the flow of messages being passed between them in order to infer where the source of a bug actually lies. This thesis foc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007